Goto

Collaborating Authors

 target room


CAMON: Cooperative Agents for Multi-Object Navigation with LLM-based Conversations

Wu, Pengying, Mu, Yao, Zhou, Kangjie, Ma, Ji, Chen, Junting, Liu, Chang

arXiv.org Artificial Intelligence

Visual navigation tasks are critical for household service robots. As these tasks become increasingly complex, effective communication and collaboration among multiple robots become imperative to ensure successful completion. In recent years, large language models (LLMs) have exhibited remarkable comprehension and planning abilities in the context of embodied agents. However, their application in household scenarios, specifically in the use of multiple agents collaborating to complete complex navigation tasks through communication, remains unexplored. Therefore, this paper proposes a framework for decentralized multi-agent navigation, leveraging LLM-enabled communication and collaboration. By designing the communication-triggered dynamic leadership organization structure, we achieve faster team consensus with fewer communication instances, leading to better navigation effectiveness and collaborative exploration efficiency. With the proposed novel communication scheme, our framework promises to be conflict-free and robust in multi-object navigation tasks, even when there is a surge in team size.


Language-Based Augmentation to Address Shortcut Learning in Object Goal Navigation

Hoftijzer, Dennis, Burghouts, Gertjan, Spreeuwers, Luuk

arXiv.org Artificial Intelligence

Deep Reinforcement Learning (DRL) has shown great potential in enabling robots to find certain objects (e.g., `find a fridge') in environments like homes or schools. This task is known as Object-Goal Navigation (ObjectNav). DRL methods are predominantly trained and evaluated using environment simulators. Although DRL has shown impressive results, the simulators may be biased or limited. This creates a risk of shortcut learning, i.e., learning a policy tailored to specific visual details of training environments. We aim to deepen our understanding of shortcut learning in ObjectNav, its implications and propose a solution. We design an experiment for inserting a shortcut bias in the appearance of training environments. As a proof-of-concept, we associate room types to specific wall colors (e.g., bedrooms with green walls), and observe poor generalization of a state-of-the-art (SOTA) ObjectNav method to environments where this is not the case (e.g., bedrooms with blue walls). We find that shortcut learning is the root cause: the agent learns to navigate to target objects, by simply searching for the associated wall color of the target object's room. To solve this, we propose Language-Based (L-B) augmentation. Our key insight is that we can leverage the multimodal feature space of a Vision-Language Model (VLM) to augment visual representations directly at the feature-level, requiring no changes to the simulator, and only an addition of one layer to the model. Where the SOTA ObjectNav method's success rate drops 69%, our proposal has only a drop of 23%.


Household navigation and manipulation for everyday object rearrangement tasks

Iyer, Shrutheesh R., Pal, Anwesan, Hu, Jiaming, Adeleye, Akanimoh, Aggarwal, Aditya, Christensen, Henrik I.

arXiv.org Artificial Intelligence

We consider the problem of building an assistive robotic system that can help humans in daily household cleanup tasks. Creating such an autonomous system in real-world environments is inherently quite challenging, as a general solution may not suit the preferences of a particular customer. Moreover, such a system consists of multi-objective tasks comprising -- (i) Detection of misplaced objects and prediction of their potentially correct placements, (ii) Fine-grained manipulation for stable object grasping, and (iii) Room-to-room navigation for transferring objects in unseen environments. This work systematically tackles each component and integrates them into a complete object rearrangement pipeline. To validate our proposed system, we conduct multiple experiments on a real robotic platform involving multi-room object transfer, user preference-based placement, and complex pick-and-place tasks. Project page: https://sites.google.com/eng.ucsd.edu/home-robot


Data-driven HVAC Control Using Symbolic Regression: Design and Implementation

Ozawa, Yuki, Zhao, Dafang, Watari, Daichi, Taniguchi, Ittetsu, Suzuki, Toshihiro, Shimoda, Yoshiyuki, Onoye, Takao

arXiv.org Artificial Intelligence

The large amount of data collected in buildings makes energy management smarter and more energy efficient. This study proposes a design and implementation methodology of data-driven heating, ventilation, and air conditioning (HVAC) control. Building thermodynamics is modeled using a symbolic regression model (SRM) built from the collected data. Additionally, an HVAC system model is also developed with a data-driven approach. A model predictive control (MPC) based HVAC scheduling is formulated with the developed models to minimize energy consumption and peak power demand and maximize thermal comfort. The performance of the proposed framework is demonstrated in the workspace in the actual campus building. The HVAC system using the proposed framework reduces the peak power by 16.1\% compared to the widely used thermostat controller.


Interactive Constrained MAP-Elites Analysis and Evaluation of the Expressiveness of the Feature Dimensions

Alvarez, Alberto, Dahlskog, Steve, Font, Jose, Togelius, Julian

arXiv.org Artificial Intelligence

We propose the Interactive Constrained MAP-Elites, a quality-diversity solution for game content generation, implemented as a new feature of the Evolutionary Dungeon Designer: a mixed-initiative co-creativity tool for designing dungeons. The feature uses the MAP-Elites algorithm, an illumination algorithm that segregates the population among several cells depending on their scores with respect to different behavioral dimensions. Users can flexibly and dynamically alternate between these dimensions anytime, thus guiding the evolutionary process in an intuitive way, and then incorporate suggestions produced by the algorithm in their room designs. At the same time, any modifications performed by the human user will feed back into MAP-Elites, closing a circular workflow of constant mutual inspiration. This paper presents the algorithm followed by an in-depth analysis of its behaviour, with the aims of evaluating the expressive range of all possible dimension combinations in several scenarios, as well as discussing their influence in the fitness landscape and in the overall performance of the mixed-initiative procedural content generation.


State Space Decomposition and Subgoal Creation for Transfer in Deep Reinforcement Learning

Sahni, Himanshu, Kumar, Saurabh, Tejani, Farhan, Schroecker, Yannick, Isbell, Charles

arXiv.org Machine Learning

Typical reinforcement learning (RL) agents learn to complete tasks specified by reward functions tailored to their domain. As such, the policies they learn do not generalize even to similar domains. To address this issue, we develop a framework through which a deep RL agent learns to generalize policies from smaller, simpler domains to more complex ones using a recurrent attention mechanism. The task is presented to the agent as an image and an instruction specifying the goal. This meta-controller guides the agent towards its goal by designing a sequence of smaller subtasks on the part of the state space within the attention, effectively decomposing it. As a baseline, we consider a setup without attention as well. Our experiments show that the meta-controller learns to create subgoals within the attention.